R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and other things that are assigned to variables and passed to other functions.
November 14th, 2018
R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and other things that are assigned to variables and passed to other functions.
The name of a functions is actually the name of a variable that contains the function, in the same way that the
log
## function (x, base = exp(1)) .Primitive("log")
This means that we can create a copy of a function by assigning its value to a new variable.
myLogFunction <- log myLogFunction
## function (x, base = exp(1)) .Primitive("log")
myNumber <- 7 class(myNumber)
## [1] "numeric"
class(log)
## [1] "function"
We can create a new function using the word "function" followed by the functions arguments and one or more R statements.
myDumbFunction <- function() 42 myDumbFunction()
## [1] 42
If there is more than one statement in a function, they should be enclosed in curly brackets:
doubleIt <- function(x) {
myResult <- x * 2
myResult
}
doubleIt(5)
## [1] 10
The last statement within the curly brackets will be the value returned by the function.
Inside a function, variables that existed in your environment can be used and even changed. However, any changes made, including changing data stored in variables and creating new variables, happens solely within the function. Your environment stays the same.
exists("myResult")
## [1] FALSE
myResult <- 1000 doubleItOutput <- doubleIt(2) myResult
## [1] 1000
The data set used in today's lecture comes from an siRNA screen that we published a few years ago. The screen looked for genes that influence parkin translocation.
High-content genome-wide RNAi screens identify regulators of parkin upstream of mitophagy. Hasson SA, Kane LA, Yamano K, Huang CH, Sliter DA, Buehler E, Wang C, Heman-Ackah SM, Hessa T, Guha R, Martin SE, Youle RJ. Nature. 2013.
The data set will be available for download from the lectures portion of the class web page.
library(readxl)
ambion <- read_excel("lecture9_files/nature.parkin.gw.xlsx", skip = 3)
str(ambion)
## Classes 'tbl_df', 'tbl' and 'data.frame': 65196 obs. of 25 variables: ## $ Vendor Supplied Gene Symbol : chr "LMAN1" "MED15" "PFN2" "KIAA1191" ... ## $ Sample : num 8.91 7.92 7.97 11.2 8.88 ... ## $ Median Negative Control on Plate : num 38.6 33.3 31.4 43.8 34.3 ... ## $ Median Positive Control on Plate : num 96.7 96.7 97.1 96.5 96.1 ... ## $ PPT Sample as Percentage of Negative Control : num 23.1 23.7 25.4 25.6 25.9 ... ## $ PPT MAD Z-Score : num -2.42 -2.4 -2.36 -2.35 -2.35 ... ## $ PPT MAD Log MAD Z-Score : num -4.82 -4.74 -4.54 -4.52 -4.49 ... ## $ Median PPT Log Mad Z-Score for all siRNAs having the same seed sequence: num -2.056 -1.421 -1.421 -1.388 -0.485 ... ## $ Number of siRNAs having the same seed sequence : num 23 152 152 28 18 28 152 28 75 18 ... ## $ Cell Count, Sample : num 1286 1203 1177 1060 1272 ... ## $ Median Negative Control Cell Count on Plate : num 1278 1310 1260 1344 1256 ... ## $ Median Positive Control Cell Count on Plate : num 1030 1050 978 1106 985 ... ## $ Sample Cell Count, MAD Z-Score Normalized to Negative Contol : num 1.109 0.241 0.399 -1.027 1.168 ... ## $ Sample__1 : num 8.94 16.79 11.3 11.13 12.89 ... ## $ Median Negative Control Mitophagy on Plate : num 10.7 28.1 29 15.5 12.1 ... ## $ Median Positive Control Mitophagy on Plate : num 22.4 49 50.1 27.7 22 ... ## $ Sample Mitophagy, MAD Z-Score Normalized to Negative Control : num 0.0466 -0.8887 -1.7 -0.4281 0.9416 ... ## $ HUGO Gene Symbol : chr "LMAN1" "PCQAP" "PFN2" "KIAA1191" ... ## $ Entrez GeneID : num 3998 51586 5217 57179 284221 ... ## $ REFSEQ : chr "NM_005570" "NM_001003891" "NM_002628" "NM_001079684" ... ## $ DESCRIPTION : chr "lectin, mannose-binding, 1" "mediator complex subunit 15" "profilin 2" "KIAA1191" ... ## $ PLATE ID : chr "QDA101771" "QDA101996" "QDA103288" "QDA103435" ... ## $ Row Number : num 1 15 14 16 1 2 1 16 10 1 ... ## $ Column Number : num 4 21 19 11 12 1 4 16 1 14 ... ## $ Ambion siRNA ID : chr "s8218" "s28366" "s10379" "s226909" ...
apply(ambion, 2, function(x) sum(is.na(x)))
## Vendor Supplied Gene Symbol ## 441 ## Sample ## 441 ## Median Negative Control on Plate ## 441 ## Median Positive Control on Plate ## 441 ## PPT Sample as Percentage of Negative Control ## 441 ## PPT MAD Z-Score ## 441 ## PPT MAD Log MAD Z-Score ## 441 ## Median PPT Log Mad Z-Score for all siRNAs having the same seed sequence ## 441 ## Number of siRNAs having the same seed sequence ## 441 ## Cell Count, Sample ## 441 ## Median Negative Control Cell Count on Plate ## 441 ## Median Positive Control Cell Count on Plate ## 441 ## Sample Cell Count, MAD Z-Score Normalized to Negative Contol ## 441 ## Sample__1 ## 441 ## Median Negative Control Mitophagy on Plate ## 441 ## Median Positive Control Mitophagy on Plate ## 441 ## Sample Mitophagy, MAD Z-Score Normalized to Negative Control ## 441 ## HUGO Gene Symbol ## 0 ## Entrez GeneID ## 441 ## REFSEQ ## 441 ## DESCRIPTION ## 441 ## PLATE ID ## 441 ## Row Number ## 441 ## Column Number ## 441 ## Ambion siRNA ID ## 441
ambion[is.na(ambion[,1]),][1,]
## Vendor Supplied Gene Symbol Sample Median Negative Control on Plate ## 64756 <NA> NA NA ## Median Positive Control on Plate ## 64756 NA ## PPT Sample as Percentage of Negative Control PPT MAD Z-Score ## 64756 NA NA ## PPT MAD Log MAD Z-Score ## 64756 NA ## Median PPT Log Mad Z-Score for all siRNAs having the same seed sequence ## 64756 NA ## Number of siRNAs having the same seed sequence Cell Count, Sample ## 64756 NA NA ## Median Negative Control Cell Count on Plate ## 64756 NA ## Median Positive Control Cell Count on Plate ## 64756 NA ## Sample Cell Count, MAD Z-Score Normalized to Negative Contol ## 64756 NA ## Sample__1 Median Negative Control Mitophagy on Plate ## 64756 NA NA ## Median Positive Control Mitophagy on Plate ## 64756 NA ## Sample Mitophagy, MAD Z-Score Normalized to Negative Control ## 64756 NA ## HUGO Gene Symbol Entrez GeneID REFSEQ DESCRIPTION PLATE ID ## 64756 PRKCE NA <NA> <NA> <NA> ## Row Number Column Number Ambion siRNA ID ## 64756 NA NA <NA>
ambion <- ambion[! is.na(ambion[,2]), ] apply(ambion, 2, function(x) sum(is.na(x)))
## Vendor Supplied Gene Symbol ## 0 ## Sample ## 0 ## Median Negative Control on Plate ## 0 ## Median Positive Control on Plate ## 0 ## PPT Sample as Percentage of Negative Control ## 0 ## PPT MAD Z-Score ## 0 ## PPT MAD Log MAD Z-Score ## 0 ## Median PPT Log Mad Z-Score for all siRNAs having the same seed sequence ## 0 ## Number of siRNAs having the same seed sequence ## 0 ## Cell Count, Sample ## 0 ## Median Negative Control Cell Count on Plate ## 0 ## Median Positive Control Cell Count on Plate ## 0 ## Sample Cell Count, MAD Z-Score Normalized to Negative Contol ## 0 ## Sample__1 ## 0 ## Median Negative Control Mitophagy on Plate ## 0 ## Median Positive Control Mitophagy on Plate ## 0 ## Sample Mitophagy, MAD Z-Score Normalized to Negative Control ## 0 ## HUGO Gene Symbol ## 0 ## Entrez GeneID ## 0 ## REFSEQ ## 0 ## DESCRIPTION ## 0 ## PLATE ID ## 0 ## Row Number ## 0 ## Column Number ## 0 ## Ambion siRNA ID ## 0
Often it will be helpful to create a new data frame with only the data we wish to analyze.
ambion.simple <- ambion[,c(19,25,1,21,7,13,17)] ambion.simple[1,]
## Entrez GeneID Ambion siRNA ID Vendor Supplied Gene Symbol ## 1 3998 s8218 LMAN1 ## DESCRIPTION PPT MAD Log MAD Z-Score ## 1 lectin, mannose-binding, 1 -4.82223 ## Sample Cell Count, MAD Z-Score Normalized to Negative Contol ## 1 1.108518 ## Sample Mitophagy, MAD Z-Score Normalized to Negative Control ## 1 0.04662911
library(knitr, quietly = TRUE)
colnames(ambion.simple) <- c("GeneID","siRNA","Symbol","Description",
"PPT","Cells","Mitophagy")
kable(head(ambion.simple, n=4), format = "markdown")
| GeneID | siRNA | Symbol | Description | PPT | Cells | Mitophagy |
|---|---|---|---|---|---|---|
| 3998 | s8218 | LMAN1 | lectin, mannose-binding, 1 | -4.822230 | 1.1085178 | 0.0466291 |
| 51586 | s28366 | MED15 | mediator complex subunit 15 | -4.739415 | 0.2405872 | -0.8887362 |
| 5217 | s10379 | PFN2 | profilin 2 | -4.539309 | 0.3994161 | -1.7000123 |
| 57179 | s226909 | KIAA1191 | KIAA1191 | -4.516443 | -1.0274124 | -0.4280631 |
library(ggplot2, quietly = TRUE) ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5)
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5) +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="blue")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="blue")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle("PARK2")
description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1] description
## [1] "Parkinson disease (autosomal recessive, juvenile) 2, parkin"
myTitle <- paste("PARK2",description,sep=": ")
myTitle
## [1] "PARK2: Parkinson disease (autosomal recessive, juvenile) 2, parkin"
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle(myTitle)
Now that we have our custom plot looking right, we would like to be able to do the same for other genes but without so much typing. First, make a new R Script in RStudio:
Frequently, making a function will simply be a function of selecting the right parts of your history and hitting the "to source" button.
graphGene <- function(gene) {
description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1]
myTitle <- paste("PARK2",description,sep=": ")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle(myTitle)
}
graphGene <- function(gene) {
description <- ambion.simple$Description[ambion.simple$Symbol == gene][1]
myTitle <- paste(gene,description,sep=": ")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == gene,],
color="red", shape=17, size =5) +
ggtitle(myTitle)
}
graphGene("PINK1")
pdfGene <- function(gene, file=paste(gene,".pdf", sep="")) {
pdf(file, width=5, height=5)
graphGene(gene)
dev.off()
}
We can use the ellipse notation (…) to indicate that extra arguments to our function should be passed on to a function that is inside our function (in this case pdf).
pdfGene <- function(gene, file=paste(gene,".pdf", sep=""), ...) {
pdf(file, ...)
graphGene(gene)
dev.off()
}
pdfGene("PINK1", width=10, height=10)
## quartz_off_screen ## 2
We can decide whether something happens in our function using "if" and "if/else".
sillyFunction <- function(x) {
if (x < 5) {
returnValue <- x
}
else {
returnValue <- x / 2
}
returnValue
}
sillyFunction(12)
## [1] 6
pdfGenes <- function(genes, file=paste(gene,".pdf", sep=""), ...) {
pdf(file, ...)
for (gene in genes) {
graphGene(gene)
}
dev.off()
}
pdfGenes(c("PLK1","PINK1","BRCA1"))